2,079 research outputs found
Recurring Query Processing on Big Data
The advances in hardware, software, and networks have enabled applications from business enterprises, scientific and engineering disciplines, to social networks, to generate data at unprecedented volume, variety, velocity, and varsity not possible before. Innovation in these domains is thus now hindered by their ability to analyze and discover knowledge from the collected data in a timely and scalable fashion. To facilitate such large-scale big data analytics, the MapReduce computing paradigm and its open-source implementation Hadoop is one of the most popular and widely used technologies. Hadoop\u27s success as a competitor to traditional parallel database systems lies in its simplicity, ease-of-use, flexibility, automatic fault tolerance, superior scalability, and cost effectiveness due to its use of inexpensive commodity hardware that can scale petabytes of data over thousands of machines. Recurring queries, repeatedly being executed for long periods of time on rapidly evolving high-volume data, have become a bedrock component in most of these analytic applications. Efficient execution and optimization techniques must be designed to assure the responsiveness and scalability of these recurring queries. In this dissertation, we thoroughly investigate topics in the area of recurring query processing on big data.
In this dissertation, we first propose a novel scalable infrastructure called Redoop that treats recurring query over big evolving data as first class citizens during query processing. This is in contrast to state-of-the-art MapReduce/Hadoop system experiencing significant challenges when faced with recurring queries including redundant computations, significant latencies, and huge application development efforts. Redoop offers innovative window-aware optimization techniques for recurring query execution including adaptive window-aware data partitioning, window-aware task scheduling, and inter-window caching mechanisms. Redoop retains the fault-tolerance of MapReduce via automatic cache recovery and task re-execution support as well.
Second, we address the crucial need to accommodate hundreds or even thousands of recurring analytics queries that periodically execute over frequently updated data sets, e.g., latest stock transactions, new log files, or recent news feeds. For many applications, such recurring queries come with user-specified service-level agreements (SLAs), commonly expressed as the maximum allowed latency for producing results before their merits decay. On top of Redoop, we built a scalable multi-query sharing engine tailored for recurring workloads in the MapReduce infrastructure, called Helix. Helix deploys new sliced window-alignment techniques to create sharing opportunities among recurring queries without introducing additional I/O overheads or unnecessary data scans. Furthermore, Helix introduces a cost/benefit model for creating a sharing plan among the recurring queries, and a scheduling strategy for executing them to maximize the SLA satisfaction.
Third, recurring analytics queries tend to be expensive, especially when query processing consumes data sets in the hundreds of terabytes or more. Time sensitive recurring queries, such as fraud detection, often come with tight response time constraints as query deadlines. Data sampling is a popular technique for computing approximate results with an acceptable error bound while reducing high-demand resource consumption and thus improving query turnaround times. In this dissertation, we propose the first fast approximate query engine for recurring workloads in the MapReduce infrastructure, called Faro. Faro introduces two key innovations: (1) a deadline-aware sampling strategy that builds samples from the original data with reduced sample sizes compared to uniform sampling, and (2) adaptive resource allocation strategies that maximally improve the approximate results while assuring to still meet the response time requirements specified in recurring queries.
In our comprehensive experimental study of each part of this dissertation, we demonstrate the superiority of the proposed strategies over state-of-the-art techniques in scalability, effectiveness, as well as robustness
Generalized Area Spectral Efficiency: An Effective Performance Metric for Green Wireless Communications
Area spectral efficiency (ASE) was introduced as a metric to quantify the
spectral utilization efficiency of cellular systems. Unlike other performance
metrics, ASE takes into account the spatial property of cellular systems. In
this paper, we generalize the concept of ASE to study arbitrary wireless
transmissions. Specifically, we introduce the notion of affected area to
characterize the spatial property of arbitrary wireless transmissions. Based on
the definition of affected area, we define the performance metric, generalized
area spectral efficiency (GASE), to quantify the spatial spectral utilization
efficiency as well as the greenness of wireless transmissions. After
illustrating its evaluation for point-to-point transmission, we analyze the
GASE performance of several different transmission scenarios, including
dual-hop relay transmission, three-node cooperative relay transmission and
underlay cognitive radio transmission. We derive closed-form expressions for
the GASE metric of each transmission scenario under Rayleigh fading environment
whenever possible. Through mathematical analysis and numerical examples, we
show that the GASE metric provides a new perspective on the design and
optimization of wireless transmissions, especially on the transmitting power
selection. We also show that introducing relay nodes can greatly improve the
spatial utilization efficiency of wireless systems. We illustrate that the GASE
metric can help optimize the deployment of underlay cognitive radio systems.Comment: 11 pages, 8 figures, accepted by TCo
Inexact Bregman Proximal Gradient Method and its Inertial Variant with Absolute and Relative Stopping Criteria
The Bregman proximal gradient method (BPGM), which uses the Bregman distance
as a proximity measure in the iterative scheme, has recently been re-developed
for minimizing convex composite problems \textit{without} the global Lipschitz
gradient continuity assumption. This makes the BPGM appealing for a wide range
of applications, and hence it has received growing attention in recent years.
However, most existing convergence results are only obtained under the
assumption that the involved subproblems are solved \textit{exactly}, which is
not realistic in many applications. For the BPGM to be implementable and
practical, in this paper, we develop inexact versions of the BPGM by employing
either an absolute-type stopping criterion or a relative-type stopping
criterion solving the subproblems. The iteration complexity of
and the convergence of the sequence are also established for
our iBPGM under some conditions. Moreover, we develop an inertial variant of
our iBPGM (denoted by v-iBPGM) and establish the iteration complexity of
, where is a restricted relative
smoothness exponent. When the smooth part in the objective has a Lipschitz
continuous gradient and the kernel function is strongly convex, we have
and thus the v-iBPGM improves the iteration complexity of the iBPGM
from to , in accordance with the
existing results on the exact accelerated BPGM. Finally, some preliminary
numerical experiments for solving the discrete quadratic regularized optimal
transport problem are conducted to illustrate the convergence behaviors of our
iBPGM and v-iBPGM under different inexactness settings
A Complete Reference of the Analytical Synchrotron External Shock Models of Gamma-Ray Bursts
Gamma-ray bursts are most luminous explosions in the universe. Their ejecta
are believed to move towards Earth with a relativistic speed. The interaction
between this "relativistic jet" and a circum burst medium drives a pair of
(forward and reverse) shocks. The electrons accelerated in these shocks radiate
synchrotron emission to power the broad-band afterglow of GRBs. The external
shock theory is an elegant theory, since it invokes a limit number of model
parameters, and has well predicted spectral and temporal properties. On the
other hand, depending on many factors (e.g. the energy content, ambient density
profile, collimation of the ejecta, forward vs. reverse shock dynamics, and
synchrotron spectral regimes), there is a wide variety of the models. These
models have distinct predictions on the afterglow decaying indices, the
spectral indices, and the relations between them (the so-called "closure
relations"), which have been widely used to interpret the rich multi-wavelength
afterglow observations. This review article provides a complete reference of
all the analytical synchrotron external shock afterglow models by deriving the
temporal and spectral indices of all the models in all spectral regimes,
including some regimes that have not been published before. The review article
is designated to serve as a useful tool for afterglow observers to quickly
identify relevant models to interpret their data. The limitations of the
analytical models are reviewed, with a list of situations summarized when
numerical treatments are needed.Comment: 119 pages, 45 figures, invited review accepted for publication in New
Astronomy Review
The extension of variability properties in gamma-ray bursts to blazars
Both gamma-ray bursts (GRBs) and blazars have relativistic jets pointing at a
small angle from our line of sight. Several recent studies suggested that these
two kinds of sources may share similar jet physics. In this work, we explore
the variability properties for GRBs and blazars as a whole. We find that the
correlation between minimum variability timescale (MTS) and Lorentz factor,
, as found only in GRBs by Sonbas et al. can be extended to blazars
with a joint correlation of . The same
applies to the correlation as
found in GRBs, which can be well extended into blazars as well. These results
provide further evidence that the jets in these two kinds of sources are
similar despite of the very different mass scale of their central engines.
Further investigations of the physical origin of these correlations are needed,
which can shed light on the nature of the jet physics.Comment: 6 pages, 2 figures, accepted for publication in MNRA
- …